RBCN: Rectified Binary Convolutional Networks with Generative Adversarial Learning 61

TABLE 3.2

With different λ, the accuracy of PCNN-22

and PCNN-40 based on WRN-22 and

WRN-40, respectively, on CIFAR10 dataset.

Model

λ

1e3

1e4

1e5

0

PCNN-22

91.92

92.79

92.24

91.52

PCNN-40

92.85

93.78

93.65

92.84

Despite the progress made in 1-bit quantization and network pruning, few works have

combined the two in a unified framework to reinforce each other. It is necessary to introduce

pruning techniques into 1-bit CNNs since not all filters and kernels are equally important

or worth quantizing in the same way. One potential solution is to prune the network and

perform a 1-bit quantization over the remaining weights to produce a more compressed

network. However, this solution does not consider the difference between binarized and full

precision parameters during pruning. Therefore, a promising alternative is to prune the

quantized network. However, designing a unified framework to combine quantization and

pruning is still an open question.

To address these issues, we introduce a rectified binary convolutional network

(RBCN) [148] to train a BNN, in which a novel learning architecture is presented in a

GAN framework. Our motivation is based on the fact that GANs can match two data

distributions (the full-precision and 1-bit networks). This can also be viewed as distill-

ing/exploiting the full precision model to benefit its 1-bit counterpart. For training RBCN,

the primary process for binarization is illustrated in Fig. 6.10, where the full-precision model

and the 1-bit model (generator) provide “real” and “fake” feature maps to the discrimina-

FIGURE 3.18

This figure shows the framework for integrating the Rectified Binary Convolutional Network

(RBCN) with Generative Adversarial Network (GAN) learning. The full precision model

provides “real” feature maps, while the 1-bit model (as a generator) provides “fake” feature

maps to discriminators trying to distinguish “real” from “fake.” Meanwhile, the generator

tries to make the discriminators work improperly. When this process is repeated, both

the full-precision feature maps and kernels (across all convolutional layers) are sufficiently

employed to enhance the capacity of the 1-bit model. Note that (1) the full precision model

is used only in learning but not in inference; (2) after training, the full precision learned

filters W are discarded, and only the binarized filters ˆW and the shared learnable matrices

C are kept in RBCN for the calculation of the feature maps in inference.